anshumali shrivastava
Breaking the Linear Iteration Cost Barrier for Some Well-known Conditional Gradient Methods Using MaxIP Data-structures
Conditional gradient methods (CGM) are widely used in modern machine learning. CGM's overall running time usually consists of two parts: the number of iterations and the cost of each iteration. Most efforts focus on reducing the number of iterations as a means to reduce the overall running time. In this work, we focus on improving the per iteration cost of CGM. The bottleneck step in most CGM is maximum inner product search (MaxIP), which requires a linear scan over the parameters. In practice, approximate MaxIP data-structures are found to be helpful heuristics. However, theoretically, nothing is known about the combination of approximate MaxIP data-structures and CGM. In this work, we answer this question positively by providing a formal framework to combine the locality sensitive hashing type approximate MaxIP data-structures with CGM algorithms. As a result, we show the first algorithm, where the cost per iteration is sublinear in the number of parameters, for many fundamental optimization algorithms, e.g., Frank-Wolfe, Herding algorithm, and policy gradient.
a1e865a9b1065392ed6035d8ccd072d9-Paper.pdf
Unfortunately,the per-iteration cost of maintaining this adaptivedistribution for gradient estimation is more than calculating the full gradient itself, which we call the chicken-and-the-egg loop. As a result, the false impression of faster convergence in iterations, inreality,leads to slower convergence in time.
c164bbc9d6c72a52c599bbb43d8db8e1-Paper.pdf
Deep neural networks have achieved impressive performance in many areas. Designing a fast and provable method for training neural networks is a fundamental question in machine learning. The classical training method requires paying ฮฉ(mnd) cost for both forward computation and backward computation, where m is the width of the neural network, and we are given n training points in d-dimensional space.
2fc6b8a3fc23108f184daa4759024c25-Paper-Conference.pdf
IntheDistanceOracle problem,thegoalistopreprocess nvectorsx1,x2,...,xn in a d-dimensional metric space(Xd, l) into a cheap data structure, so that given a query vectorq Xd and a subsetS [n] of the input data points, all distances q xi l forxi S canbequicklyapproximated(fasterthanthetrivial d|S|querytime).